Goto

Collaborating Authors

 rl policy



Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach Rory Young Nicolas Pugeault School of Computing Science University of Glasgow

Neural Information Processing Systems

Deep reinforcement learning agents achieve state-of-the-art performance in a wide range of simulated control tasks. However, successful applications to real-world problems remain limited. One reason for this dichotomy is because the learnt policies are not robust to observation noise or adversarial attacks. In this paper, we investigate the robustness of deep RL policies to a single small state perturbation in deterministic continuous control tasks.



Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach

Neural Information Processing Systems

Deep reinforcement learning agents achieve state-of-the-art performance in a wide range of simulated control tasks. However, successful applications to real-world problems remain limited. One reason for this dichotomy is because the learnt policies are not robust to observation noise or adversarial attacks. In this paper, we investigate the robustness of deep RL policies to a single small state perturbation in deterministic continuous control tasks. We demonstrate that RL policies can be deterministically chaotic, as small perturbations to the system state have a large impact on subsequent state and reward trajectories. This unstable non-linear behaviour has two consequences: first, inaccuracies in sensor readings, or adversarial attacks, can cause significant performance degradation; second, even policies that show robust performance in terms of rewards may have unpredictable behaviour in practice. These two facets of chaos in RL policies drastically restrict the application of deep RL to real-world problems. To address this issue, we propose an improvement on the successful Dreamer V3 architecture, implementing Maximal Lyapunov Exponent regularisation. This new approach reduces the chaotic state dynamics, rendering the learnt policies more resilient to sensor noise or adversarial attacks and thereby improving the suitability of deep reinforcement learning for real-world applications.


VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture

Wang, Maonan, Chen, Yirong, Pang, Aoyu, Cai, Yuxin, Chen, Chung Shue, Kan, Yuheng, Pun, Man-On

arXiv.org Artificial Intelligence

Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.


Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control

Stewart, Kenneth, Chapin, Samantha, Leontie, Roxana, Henshaw, Carl Glen

arXiv.org Artificial Intelligence

Abstract-- Reinforcement learning (RL) offers transforma-tive potential for robotic control in space. We present the first on-orbit demonstration of RL-based autonomous control of a free-flying robot, the NASA Astrobee, aboard the International Space Station (ISS). Using NVIDIA's Omniverse physics simulator and curriculum learning, we trained a deep neural network to replace Astrobee's standard attitude and translation control, enabling it to navigate in microgravity. This successful deployment demonstrates the feasibility of training RL policies terrestrially and transferring them to space-based applications. This paves the way for future work in In-Space Servicing, Assembly, and Manufacturing (ISAM), enabling rapid on-orbit adaptation to dynamic mission requirements. Future In-Space Servicing, Assembly, and Manufacturing (ISAM) missions require increasingly autonomous robotic systems capable of adapting to the dynamic and uncertain conditions of space.


Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) International Space Station Astrobee Testing

Chapin, Samantha, Stewart, Kenneth, Leontie, Roxana, Henshaw, Carl Glen

arXiv.org Artificial Intelligence

The US Naval Research Laboratory's (NRL's) Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) experiment pioneers the use of reinforcement learning (RL) for control of free-flying robots in the zero-gravity (zero-G) environment of space. On Tuesday, May 27th 2025 the APIARY team conducted the first ever, to our knowledge, RL control of a free-flyer in space using the NASA Astrobee robot on-board the International Space Station (ISS). A robust 6-degrees of freedom (DOF) control policy was trained using an actor-critic Proximal Policy Optimization (PPO) network within the NVIDIA Isaac Lab simulation environment, randomizing over goal poses and mass distributions to enhance robustness. This paper details the simulation testing, ground testing, and flight validation of this experiment. This on-orbit demonstration validates the transformative potential of RL for improving robotic autonomy, enabling rapid development and deployment (in minutes to hours) of tailored behaviors for space exploration, logistics, and real-time mission needs.


Coordinating Spinal and Limb Dynamics for Enhanced Sprawling Robot Mobility

Atasever, Merve, Okhovat, Ali, Nazaripouya, Azhang, Nisbet, John, Kurkutlu, Omer, Deshmukh, Jyotirmoy V., Aydin, Yasemin Ozkan

arXiv.org Artificial Intelligence

Sprawling locomotion in vertebrates, particularly salamanders, demonstrates how body undulation and spinal mobility enhance stability, maneuverability, and adaptability across complex terrains. While prior work has separately explored biologically inspired gait design or deep reinforcement learning (DRL), these approaches face inherent limitations: open-loop gait designs often lack adaptability to unforeseen terrain variations, whereas end-to-end DRL methods are data-hungry and prone to unstable behaviors when transferring from simulation to real robots. We propose a hybrid control framework that integrates Hildebrand's biologically grounded gait design with DRL, enabling a salamander-inspired quadruped robot to exploit active spinal joints for robust crawling motion. Our evaluation across multiple robot configurations in target-directed navigation tasks reveals that this hybrid approach systematically improves robustness under environmental uncertainties such as surface irregularities. By bridging structured gait design with learning-based methodology, our work highlights the promise of interdisciplinary control strategies for developing efficient, resilient, and biologically informed spinal actuation in robotic systems.


Federated Deep Reinforcement Learning for Privacy-Preserving Robotic-Assisted Surgery

Hafeez, Sana, Mulkana, Sundas Rafat, Imran, Muhammad Ali, Sevegnani, Michele

arXiv.org Artificial Intelligence

The integration of Reinforcement Learning (RL) into robotic-assisted surgery (RAS) holds significant promise for advancing surgical precision, adaptability, and autonomous decision-making. However, the development of robust RL models in clinical settings is hindered by key challenges, including stringent patient data privacy regulations, limited access to diverse surgical datasets, and high procedural variability. To address these limitations, this paper presents a Federated Deep Reinforcement Learning (FDRL) framework that enables decentralized training of RL models across multiple healthcare institutions without exposing sensitive patient information. A central innovation of the proposed framework is its dynamic policy adaptation mechanism, which allows surgical robots to select and tailor patient-specific policies in real-time, thereby ensuring personalized and Optimised interventions. To uphold rigorous privacy standards while facilitating collaborative learning, the FDRL framework incorporates secure aggregation, differential privacy, and homomorphic encryption techniques. Experimental results demonstrate a 60\% reduction in privacy leakage compared to conventional methods, with surgical precision maintained within a 1.5\% margin of a centralized baseline. This work establishes a foundational approach for adaptive, secure, and patient-centric AI-driven surgical robotics, offering a pathway toward clinical translation and scalable deployment across diverse healthcare environments.


Interpret Policies in Deep Reinforcement Learning using SILVER with RL-Guided Labeling: A Model-level Approach to High-dimensional and Multi-action Environments

Qian, Yiyu, Nguyen, Su, Chen, Chao, Zhou, Qinyue, Zhao, Liyuan

arXiv.org Artificial Intelligence

Deep reinforcement learning (RL) achieves remarkable performance but lacks interpretability, limiting trust in policy behavior. The existing SIL VER framework (Li, Siddique, and Cao 2025) explains RL policy via Shapley-based regression but remains restricted to low-dimensional, binary-action domains. We propose SIL VER with RL-guided labeling, an enhanced variant that extends SIL VER to multi-action and high-dimensional environments by incorporating the RL policy's own action outputs into the boundary points identification. Our method first extracts compact feature representations from image observations, performs SHAP-based feature attribution, and then employs RL-guided labeling to generate behaviorally consistent boundary datasets. Surrogate models, such as decision trees and regression-based functions, are subsequently trained to interpret RL policy's decision structure. We evaluate the proposed framework on two Atari environments using three deep RL algorithms and conduct human-subject study to assess the clarity and trustworthiness of the derived interpretable policy. Results show that our approach maintains competitive task performance while substantially improving transparency and human understanding of agent behavior.